Search CORE

244 research outputs found

Retrieving descriptive phrases from large amounts of free text

Author: Joho H.
Sanderson M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2000
Field of study

This paper presents a system that retrieves descriptive phrases of proper nouns from free text. Sentences holding the specified noun are ranked using a technique based on pattern matching, word counting, and sentence location. No domain specific knowledge is used. Experiments show the system able to rank highly those sentences that contain phrases describing or defining the query noun. In contrast to existing methods, this system does not use parsing techniques but still achieves high levels of accuracy. From the results of a large-scale experiment, it is speculated that the success of this simpler method is due to the high quantities of free text being searched. Parallels between this work and recent findings in the very large corpus track of TREC are drawn

CiteSeerX

Crossref

White Rose Research Online

Slicing and dicing the information space using local contexts

Author: Joho H.
Jose J.M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

In recent years there has been growing interest in faceted grouping of documents for Interactive Information Retrieval (IIR). It is suggested that faceted grouping can offer a flexible way of browsing a collection compared to clustering. However, the success of faceted grouping seems to rely on sufficient knowledge of collection structure. In this paper we propose an approach based on the local contexts of query terms, which is inspired by the interaction of faceted search and browsing. The use of local contexts is appealing since it requires less knowledge of the collection than existing approaches. A task-based user study was carried out to investigate the effectiveness of our interface in varied complexity. The results suggest that the local contexts can be exploited as the source of search result browsing in IIR, and that our interface appears to facilitate different aspects of search process over the task complexity. The implication of the evaluation methodology using high complexity tasks is also discussed

Crossref

Enlighten

Sheffield and terabyte-scale test collections

Author: Joho H.
Sanderson M.
Publication venue
Publication date
Field of study

White Rose Research Online

Concept-based Interactive Query Expansion Support Tool (CIQUEST)

Author: Beaulieu M.
Joho H.
Sanderson M.
Publication venue: Resource: The Council for Museums, Archives and Libraries
Publication date: 01/01/2003
Field of study

This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

White Rose Research Online

Automatically organising images using concept hierarchies

Author: Clough P.
Joho H.
Sanderson M.
Publication venue
Publication date
Field of study

In this paper we discuss the use of concept hierarchies, an approach to automatically organize a set of documents based upon a set of concepts derived from the documents themselves for image retrieval. Co-occurrence between terms associated with image captions and a statistical relation called subsumption are used to generate term clusters which are organized hierarchically. Previously, the approach has been studied for document retrieval and results have shown that automatically generating hierarchies can help users with their search task. In this paper we present an implementation of concept hierarchies for image retrieval, together with preliminary ad-hoc evaluation. Although our approach requires more investigation, initial results from a prototype system are promising and would appear to provide a useful summary of the search results

White Rose Research Online

Spatio-textual indexing for geographical search on the web

Author: Joho H.
Jones C.B.
Sanderson M.
Vaid S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This can be overcome by associating text indexing with spatial indexing methods that exploit geo-tagging procedures to categorise documents with respect to geographic space. We describe three methods for spatio-textual indexing based on multiple spatially indexed text indexes, attaching spatial indexes to the document occurrences of a text index, and merging text index access results with results of access to a spatial index of documents. These schemes are compared experimentally with a conventional text index search engine, using a collection of geo-tagged web documents, and are shown to be able to compete in speed and storage performance with pure text indexing

CiteSeerX

Crossref

Online Research @ Cardiff

White Rose Research Online

Document frequency and term specificity

Author: Joho H.
Sanderson M.
Publication venue
Publication date: 01/01/2007
Field of study

Document frequency is used in various applications in Information Retrieval and other related fields. An assumption frequently made is that the document frequency represents a level of the term’s specificity. However, empirical results to support this assumption are limited. Therefore, a large-scale experiment was carried out, using multiple corpora, to gain further insight into the relationship between the document frequency and terms specificity. The results show that the assumption holds only at the very specific levels that cover the majority of vocabulary. The results also show that a larger corpus is more accurate at estimating the specificity. However, the co-occurrence information is shown to be effective for improving the accuracy when only a small corpus is available

CiteSeerX

RMIT Research Repository

White Rose Research Online